1
Data Science Reduces
Anatomic Pathology Reporting Errors
Session # 270, February 14, 2019
Jay J. Ye, MD, PhD, Pathologist
Dahl-Chase Pathology Associates
2
Jay J. Ye, MD, PhD
Has no real or apparent conflicts of interest
to report.
Conflict of Interest
3
Introduction:
- Pathology process: specimens to reports
- Types of errors: interpretation vs reporting
- Data science: why effective
Catching the errors:
- Two programs: web application / auto-email
- Mechanisms
Bigram approach
Implications
Agenda
4
Define pathology reporting errors and
describe their characteristics
Describe data science and tools of data
science
Describe how data science is used to
reduce reporting errors
Learning Objectives
5
Accessioning: entering demographic
and specimen info
Gross examination, (sectioning), and
submitting of specimens: specimens
to cassettes
Histology: tissue in cassettes to glass
slides
Interpretation: from slides to final
reports
The process of anatomic pathology:
from specimens to reports
6
Example of a report
7
Interpretation errors vs
reporting errors
Interpretation errors
Professional judgement
Reporting errors
Not professional judgement
8
Benign breast lesion vs. breast cancer
Benign mole vs melanoma
Positive vs negative margin
Missing the lesion
Examples of interpretation errors
9
Patient ID, left vs right (not included)
Typo / voice recognition errors
Internal inconsistency within the report
Paraffin block designation errors
Unintentional omission of special
studies (immuno, special, etc.)
Incorrect patient sex, etc
Examples of reporting errors
10
Professional training
Subspecialization
Policies and procedures, including certain
mandatory reviews.
……
To reduce interpretation errors
11
Not related to the professional abilities
Sometimes difficult to catch
Generally obvious after being pointed out
Characteristics of reporting errors
12
Annoying the clinicians
Conveying incorrect information
Reducing the trustworthiness of the report
Compliance issue (special studies
performed, billed but not reported)
Harmful effects of reporting errors
13
Data science and its tools
1. Data science extracts:
Lots of data concise knowledge
2. Tools: R and Python, etc.
knowledge
Data, data, and lots of
data… 010101010
Data data 0101100
Data data 01010001
Data, 01010100111
More data… 111000
14
Data science: great at extracting
knowledge from large quantity of data
(including finding a needle in a hay stack)
Reporting errors: Rare for each particular
error, therefore sometimes difficult to
catch (a needle-in-a-hay-stack problem)
using data science tools to catch errors
Pairing data science with
reporting error detection
15
Report checker for PA: a web application
for pathologist assistants to check for the
errors in the preliminary reports.
Email notification to pathologists: for
pathologists on special studies not
reported in the final reports.
Two programs for error catching:
preliminary and final reports
16
Two ways to interact with data
17
Report Checker
18
Internal web address: cronos:8252
Report checker
19
Voice Recognition Errors in
Clinical Information
20
Voice Recognition Errors in Clinical
Information: Sounds Like
Specimen list / clinical information
discrepancy due to phonetic similarity:
lid/lip
chin/shin
thigh/side
thigh/thyroid
21
Examples of voice errors in gross
… with a 1.2 x 0.3 sodium segment of
a 0.7 x 0.4 x 0.1 tan tan-gray skin shave
… right hemicolectomy consisting of 9.5 cm
maternal ileum, 37.5 cm of cecum…
22
Block Designation Errors
1A 2A 3A 4A-4B
23
Block error 1: Omission
Part 3. …The largest fragment is inked at the
base, trisected 3A.
** Missing 3B,3C,3D
1A lateral margin, perpendicular.
1B-1L central sections each to …
1K medial margin, perpendicular
**missing 1M
24
Block error 2: addition
Part 1. …The tissue is poured into a
specimen bag. Intact 1A-1C
Should have been 1A-1B (1C does
not exist)
Part 6. … entirely serially submitted as
6A-6C with the tips, perpendicular within
6A and 6D
6D should have been 6C (6D does
not exist)
25
Block error3: ambiguity
1B-1C lesion entirely submitted
1C-1D representative fragmented sections
1C-1D lesion entirely submitted.
1F nearest radial circumferential…
1F representative section of mesenteric margin
1G-1H proximal polypectomy site.
26
Wrong Sex
Female
Male
Name
Sex
report
27
Examine reports finalized by pathologists
once every 5 minutes for:
Additional studies performed but not
reported
Email the pathologist when such cases
are found
Automatic emails to pathologist
28
Subject: Stains not reported: S-17-##### Keratin
AE1/AE3
Nothing mentioned
Added: “Cytokeratin AE1/3 confirms the
presence of invasive carcinoma extending to the
inferior margin.”
Subject: Stains not reported: S-18-##### CYCLIN
D-1 | KAPPA-ISH | LAMBDA-ISH
“CD138 highlights plasma cells which are
polyclonal on K/L ISH.”
“CD138 highlights plasma cells which are
polyclonal on kappa/lambda ISH. Cyclin D-1 is
negative.
Emails sent and changes made
29
Subject: Stains not reported: S-17-
##### CD30
… CD20 positive, PAX-5 positive B cells
which also express BCL-6 and BCL-2
(dim). The atypical cells are negative for
CD5, CD10, CD38, cyclin D1 and EBV-
ish. MUM1 shows borderline positivity in
a subset of the cells. Ki-67 shows a
proliferation index of approximately 80%.
CD3 stains … (1 out of 12 stains)
Sometimes it catches errors
30
Voice: 7-8%
Blocks: 1%
Stains not reported, average 1 a day
Sex error: rare
Prevalence of errors by types
31
Mechanisms for error catching
Example of SQL and data in a table
32
Query: SQL wrapped in R
33
Processing Retrieved Data: R code
34
1. Error patterns: reading the text
- Exact texts: e.g. “Pass pending”(Path
pending)
- Patterns of texts: e.g. “23.5 segment
of colon” (no unit, cm? mm?)
2. Conflicting information: needing
information from different sources
Patient sex, block submission, special
studies not reported, wrong providers,
etc..
Basis for error identification
35
1. List items in a .txt file
epidermal edema
Pass pending
(The correct ones should be dermal edema,
path pending)
2. Read the .txt file into R, assigned to a
variable Clinicalerrors :
Clinicalerrors <- “…|epidermal
edema|Passed pending|…”
Detecting errors by exact matching
36
3. Identify any report with clinical
information containing these words:
clinicalVRE2 <- map_chr(clinical_rpt$report,
~ paste(str_match_all(., clinicalerrors),
collapse = "/"))
Detecting errors by exact matching
37
User can teach the program to catch more
and more errors over time by adding new
entries to a .txt file
Users can teach the program
38
Regular expression can be used to
match patterns of texts:
[0-9]{1,2}[.][0-9] by
no_cm <- str_match(block_rpt$report, " x
[0-9]{1,2}[.][0-9] [(a-bd-z)][(a-ln-z)]([a-zA-
Z,. ]{5})?")[,1]
Patterns cannot be easily added to the
.txt file by the users
Detecting error by regular expression
39
Examples of checking by inconsistency:
Block checking: blocks submitted in
database vs information parsed from
gross description text
Sex checking: sex in database vs naming
convention and report text
Stains not reported: stains billed in
database vs info parsed from the
diagnosis text
Conflicting information
40
Stains not reported as an example:
1. Stains retrieved from the database:
CD3 / ER Quant / P63-Double Stain /
S100-Brown /CYCLIN D-1
2. A .txt file, each line converting a stain to
the possible wordings in the reports:
CYCLIN D-1,CYCLIN D1|cyclin d1|cyclin
D1|Cyclin D1|cyclin D-1|Cyclin D-1|Cyclin-
D1|cyclind-1|cyclinD-1|cyclinD1|cyclin-D1
41
For the sentence:
Below you will find some important
instructions about HIMSS19 session
information, registration, hotel and travel.
Word bigrams:
Below you / you will / will find / find some
…… / registration hotel / hotel and / and
travel
Newly implemented bigram approach
- word bigrams
42
1. Retrieving gross description text for 200k
specimens.
2. Preprocessing these texts:
- removing patient info, specimen label,
all capitalized words / acronyms
- Changing all the digits to 8, except
when it is 1
3. Constructing a bigram library of ~ 50k
Construction of normal bigrams
43
Examples of “normal” unique bigrams
in the bigram library
44
Than mature: The endometrial cavity …
mass identified: A well-formed mass is not
identified
Example of Report checker output
by bigram approach
45
Two weeks of gross description texts (1717
cases)
1. 0.5% of bigrams not in the library, flagging
3% of the sentences
2. 10% of the flagged sentences contain
errors, 1.9% of the cases
Checking new gross texts for
bigrams not in the library
46
Confusing errors
where from margin / away from margin
uterine process / uterine corpus
port material / soft material
Very minor errors
is an blue /is a blue
is it is 0.3 x 0.2 cm / is a 0.3 x 0.2 cm
Examples of the errors
47
Select the entries with no mistake (darker color) and then click the
button below to teach the program
Checker learning: user updating
of the bigram library
48
Other methods: searching for needles in a
hay stack (you need to know what
needles look like).
Bigram method: identifying stuff that is not
hay (you only need to know what hay
looks like catching errors not seen
before).
Bigram method vs other methods
49
Implications
More errors than we think: bigram / sex
Universally implementable: database:
information system-independent
High “benefit-cost ratio”: very little
increased effort
Substantive effects on patient care:
Avoid conveying wrong information
Enable PA and pathologists to focus on
the professional component of the tasks
50
Jay J. Ye, MD, PhD, Pathologist
Dahl-Chase Pathology Associates
jye@dahlchase.com
(207) 941-8200
I greatly appreciate the staff in the gross room in their
dedication and effort to reduce the numbers of errors in the
preliminary reports
Please complete the online session evaluation
Questions